On the largest empty axis-parallel box amidst n points 

Adrian Dumitrescu* Minghui Jiang 



November 23, 2009 



Abstract 



We give the first nontrivial upper and lower bounds on the maximum volume of an empty axis- 
parallel box inside an axis-parallel unit hypercube in R'^ containing n points. For a fixed d, we show that 
the maximum volume is of the order O (i). We then use the fact that the maximum volume is (^) 
in our design of the first efficient (1 — £)-approximation algorithm for the following problem: Given 
an axis-parallel d-dimensional box R in R*^ containing n points, compute a maximum-volume empty 
axis-parallel d-dimensional box contained in R. The running time of our algorithm is nearly linear in 
n, for small d, and increases only by an O(logrt) factor when one goes up one dimension. No previous 
efficient exact or approximation algorithms were known for this problem for rf > 4. As the problem has 
been recently shown to be NP-hard in arbitrary high dimensions (i.e., when d is part of the input), the 
existence of efficient exact algorithms is unlikely. 

We also obtain tight estimates on the maximum volume of an empty axis-parallel hypercube inside 
an axis-parallel unit hypercube in W'' containing n points. For a fixed d, this maximum volume is of the 
same order order 8 (^). A faster (1 — e) -approximation algorithm, with a milder dependence on d in 
the running time, is obtained in this case. 
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1 Introduction 



Given a set S of n points in the unit square U = [0, 1]^, let A{S) be the maximum area of an empty axis- 
parallel rectangle contained in U, and let A{n) be the minimum value of A{S) over all sets 5 of n points 
in U. For any dimension d > 2, given a set 5 of n points in the unit hypercube Ud = [0, l]"^, let Ad{S) as 
the maximum volume of an empty axis-parallel hyperrectangle (d-dimensional axis-parallel box) contained 
in Ud, and let Ad{n) be the minimum value of Ad{S) over all sets of n points in Ud- For simplicity we 
sometimes omit the subscript d in the planar case (d = 2). 

In this paper we give the first nontrivial upper and lower bounds on Ad{n). For any dimension d, our 
estimates are within a multiplicative constant (depending on d) from each other. For a fixed d, we show that 
the maximum volume is of the order 0(^)- While the algorithmic problem of finding an empty axis-parallel 
box of maximum volume has been previously studied for d = 2,3 (see below), estimating the maximum 
volume of such a box as a function of d and n seems to have not been previously considered. 

We first introduce some notations and definitions. Throughout this paper, a box is an open axis-parallel 
hyperrectangle contained in the unit hypercube Ud = [0, l]'^, d > 2. Given a set S of points in Ud, a box B 
is empty if it contains no points in S, i.e., i? n 5 = 0. If is a box, we also refer to the side length of B 
in the ith coordinate as the extent in the ith coordinate of B. Throughout this paper, log n and In n are the 
logarithms of n in base 2 and e, respectively. 

Given an axis-parallel rectangle R in the plane containing n points, the problem of computing a maximum- 
area empty axis-parallel sub-rectangle contained in R is one of the oldest problems studied in computational 
geometry. For instance, this problem arises when a rectangular shaped facility is to be located within a simi- 
lar region which has a number of forbidden areas, or in cutting out a rectangular piece from a large similarly 
shaped metal sheet with some defective spots to be avoided 118|. In higher dimensions, finding the largest 
empty axis-parallel box has applications in data mining, in finding large gaps in a multi-dimensional data 
set I1T31 . 

Several algorithms have been proposed for the planar problem over the years |[Tll2l[3l[8l [TTl[T7l[T8l[T9]| . 
For instance, an early algorithm by Chazelle, Drysdale and Lee 1 8 ] runs in 0{n log^ n) time and 0{n log n) 
space. The fastest known algorithm, proposed by Aggarwal and Suri in 1987 lU, runs in 0(n log^ n) time 
and 0{n) space. A lower bound of Q,{n log n) in the algebraic decision tree model for this problem has been 
shown by Mckenna et al. [|17|1 . 

For any dimension d, there is an obvious brute-force algorithm running in 0(n^'^+^) time and 0{n) 
space. No significantly faster algorithms, i.e., with a fixed degree polynomial running time in M'^, where 
known. Confirming this state of affairs. Backer and Keil miH recently proved that the problem is NP-hard 
in arbitrary high dimensions (i.e., when d is part of the input). They also gave an exact algorithm running 
in 0{n'^ log'*^^ n) time, for any d > 3. In particular, the running time of their exact algorithm for d = 3 is 
O(Ti^logn). Previously, Datta and Soundaralakshmi [12] had reported an O(n^) time exact algorithm for 
the d = 3 case, but their analysis for the running time seems incomplete. Specifically, the 0{n^) running 
time depends on an O(n^) upper bound on the number of maximal empty boxes (see discussions in the 
next paragraph), but they only gave an r2(n'^) lower bound. Here we present the first efficient (1 — e)- 
approximation algorithm for finding an axis-parallel empty box of maximum volume, whose running time 
is nearly linear for small d, and increases only by an O(logn) factor when one goes up one dimension. 

An empty box of maximum volume must be maximal with respect to inclusion. Following the termi- 
nology in fT8], a maximal empty box is called restricted. Thus the maximum- volume empty box in Ud is 
restricted. Naamad et al. |[T8l have shown that in the plane, the number of restricted rectangles is O(n^), and 
that this bound is tight. It was conjectured by Datta and Soundaralakshmi 1121 that the maximum number 
of restricted boxes is 0{n'^) for each (fixed) d. The conjecture has been recently confirmed by Backer and 
Keil [5, 6] (for d > 3). Here we extend (Theorem|71 Appendix iDb the constructions with Q,{n'^) restricted 
boxes for d = 2 in [il8il and d = 3 in |[T2]| for arbitrary d. Independently and simultaneously. Backer and 
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Keil have also obtained this resuU HHlllH. Hence the maximum number of restricted boxes is Q{n'^) for 
each fixed d. This means that any algorithm for computing a maximum-volume empty box based on enu- 
merating restricted boxes is inefficient in the worst case. On the other hand, at the expense of giving an 
(1 — e) -approximation, our algorithm does not enumerate all restricted boxes, and achieves efficiency by 
enumerating all canonical boxes (to be defined) instead. 



Our results are: 

(I) In Section |2] we show that Ad{n) = 9 (i) for d > 2. More precisely: Aa{n) > and Ad{n) > 

(I - o(l)) • From the other direction we have A2{n) < 4 • i, and Ad{n) < (2^^"^ Tlti Pi) " ^ 
for any d> 3. Here pi is the ith prime. 

(II) In Section [3] we exploit the fact that the maximum volume is ^ {^) in our design of the first effi- 

cient (1 — e)-approximation algorithm for finding the largest empty box: Given an axis-parallel d- 
dimensional box R in M'^ containing n points, there is a (1 — e)-approximation algorithm, running in 
0((8ede~^)°' • nlog'^n) time, for computing a maximum-volume empty axis-parallel box contained 
in R. 

(III) In Appendix IB] we show that the © (^) estimate also holds for the maximum volume (or area) of 
an axis-aligned hypercube (or square) amidst n point in [0, l]*^. In Appendix O we present a faster 
(1 — e) -approximation algorithm for finding the largest empty hypercube: Given an axis-parallel d- 
dimensional hypercube R in M"' containing n points, there is a (1 — e)-approximation algorithm, 
running in 0{d'^e~^ ■ nlogn + {Ade~^)'^'^^ ■ n^/'^logn) time, for computing a maximum-volume 
empty axis-parallel hypercube contained in R. 

(IV) In Appendix ID] we derive an Q{n'^) lower bound on the number of restricted boxes in d-space, for 
fixed d. This matches the recent 0{n'^) upper bound of Backer and Keil fHHl. Following their idea, 
we further narrow the gap between the bounds (in the dependence of d) based on a finer estimation. 



2 Empty rectangles and boxes 
2.1 Empty rectangles in the plane 

The lower bound. We start with a very simple-minded lower bound; however, as it turns out, it is very 
close to optimal. One can immediately see that A{n) = ^{-), by partitioning the unit square with vertical 
lines through each point: out of at most n + 1 resulting empty rectangles, the largest rectangle has area at 
least ^-r. Thus we have: 

Proposition 1. 

A{n) > (1) 
n + l 

The following observation is immediate from invariance under scaling with respect to any of the coor- 
dinate axes. 

Observation 1. Assume that A{n) > z holds for some n and z. Then, given n points in an axis-aligned 
rectangle R, there is an empty rectangle contained in R of area at least z ■ area{R). 

Using the next two lemmas we will slightly improve the trivial lower bound A{n) > in our next 
Theorem[T] Let ^ = ^"'^ be the solution in (0, 1) of the quadratic equation (1 — x)^ = x. 
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Lemma 1. Given 2 points in the unit square, there exists an empty rectangle with area at least ^ ^ . This 
bound is tight, i.e., A{2) = = 0.3819 . . .. 

Proof. Let pi,p2 G U, and assume without loss of generality that x{pi) < x{p2), and y{pi) > y{p2)- 
Write X = x{pi), and y = y{p2)- Consider the three empty rectangles (0, x) x (0, 1), (0, 1) x (0, y), and 
(x, 1) X (y, 1). Their areas are x, y, and (1 — x)(l — y), respectively. If x > ^ or y > ^, we are done, as one 
of the first two rectangles has area at least ^. So we can assume that x < ^ and y < Then it follows that 

(l-x)(l-y)>(l-0' = C, 

so the third rectangle has area at least ^, as required. 

To see that this bound is tight, take pi = 1 — ^), p2 = {I — ^, and check that the largest empty 
rectangle has area ^. □ 

The proof of the next lemma appears in Appendix lAl 

Lemma 2. Given 4 points in the unit square, there exists an empty rectangle with area at least ^. This 
bound is tight, i.e., A{4) = i. 

Theorem 1. Given n points in the unit square, there exists an empty rectangle with area at least (|— o(l))-^. 
Thatis,A{n)>{\-o{l))-l. 

Proof. Write n = bk + r, for some /c G N and r G {0, 1, 2, 3, 4}. Partition U into A; + 1 rectangles of 
equal width. There exists at least one rectangle R' with at most 4 points in its interior. By Lemma |2] and 
Observation [T] R! contains an empty rectangle of area at least 

11 5 1 /5 , A 1 
> = - - o 1 



4 k + l ~ A n + 5 \A J n 

as claimed. □ 

The lower bound derived in the proof, | • is better than ^q-j- for all n > 16. For n = 5/c + 4, the 
resulting bound is | • An alternative partition, yielding the same bound in Theorem [T] can be obtained 
by dividing U into rectangles with vertical lines through every 5th point of the set. Slightly better lower 
bounds, particularly for small values of n can be obtained by constructing different partitions tailored for 
specific values of k, r (with a number of points other than 4 in a few of the rectangles), and using estimates 
on A{2), A{Q), etc. For instance, from Lemma |2] we can derive that ^(6) > 3 - 2^/2 = 0.1715.... 
Incidentally, we remark that a suitable 6-point construction gives from the other direction that ^(6) < 0.2. 



The upper bound. Let C„ be the van der Corput set of n points 191 [TOl, with coordinates {x{k),y{k)), 
{) < k < n — 1, constructed as follows ||7l[l6l: Let x{k) = k/n. If /c = X]j>o"i2"' binary 
representation of k, where aj G {0, 1}, then y{k) = X^j>o '^j'^ ''^^- Observe that all points in Cn lie in the 
unit square U = [0, 1]^. 

Theorem 2. For the van der Corput set of n points, Cn C U, the area of the largest empty axis-parallel 
rectangle is less than 4/n. 

Proof. Let B be any open empty axis-parallel rectangle inside the unit square. We next showQ that the area 
of B is less than 4/n. Following the presentation in |16, p. 39], a canonical interval is an interval of the 
form [u • 2^^, (n + 1) • 2^") for some positive integer v and an integer u G [0, 2" — 1]. 

'The argument we use here is similar to that used for bounding the geometric discrepancy of the van der Corput set of points. 
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Let ly = [t- 2~'^, (t + 1) • 2"'^) be the longest canonical interval contained in the projection of the empty 
rectangle B onto the y-axis (recall that B is open, so this projection is an open interval). Then the side 
length of B along y must be less than 2 • 2"'^+^ because otherwise the projection would contain a longer 
canonical interval of length 2^'?+^. 

Let k = ^j>o be the binary representation of an integer /c, < A; < n — 1. In the van der Corput 
construction, a point in C„ with x-coordinate k/n has its y-coordinate in the canonical interval ly if and 
only if t • 2-9 < Y.j>o 0'j'^~^~^ <(* + !)• 2"^ which happens exactly when Yl^l 0^2"^"^ = t • 2~'^. In 
this case, k mod 2*^ = X^JZq ^^2-' is a constant z = z{t,q). It then follows that the side length of B along 
X is at most 2'^/n. Therefore the area of B is less than 2 • 2^'^+^ • 29/n = 4/n, as required. □ 

Corollary 1. A{n) < 4 • i. 

2.2 Empty boxes in higher dimensions 

As in the planar case, Ad{n) > is immediate, by partitioning the hypercube Ud with n axis-parallel 
hyperplanes, one through each of the n points. By projecting the n points on one of the faces of Ud, and 
proceeding by induction on d, it follows that the lower bound in Theorem |2] carries over here too. Thus we 
have: 

Proposition 2. Adin) > Moreover, Ad{n) > (f - o(l)) • i. 

We next show that, modulo a constant factor depending on d, this estimate is also best possible. Let 
Hn be the Halton-Hammersely set of n points |[T4l[T5]| . with coordinates {xo{k), xi{k), . . . , Xd-i{k)), < 
k < n — 1, constructed as follows fT^, T6l: Let pi be the ith prime number. Each integer k has a unique 
base-pi representation k = Ylj>o^iJP^i' where aij G [0,Pi — 1]- Let xo{k) = k/n, and let Xi{k) = 
Sj>o 1 < ^ < d — 1- Then all points in Hn are inside the unit hypercube Ud = [0, 1]'^. 

Theorem 3. For the Halton-Hammersely set of n points, Hn C Ud, the volume of the largest empty axis- 
parallel box is less than (2"^^^ Y\i=i Pi) I '<^> where pi is the ith prime. 

Proof. Let B be any open empty box inside the unit hypercube. We next show that the volume of B is less 
than (2^^-^ Yli=i Pi) / Generalizing the planar case, a canonical interval of the axis Xj, 1 < i < d — 1, is 
an interval of the form [u ■ p~^, (u + 1) • p^^) for some positive integer v and an integer u G [0,p^ — 1]. 
Note that pi = 2. 

First consider each axis Xi, 1 < i < d — 1. Let /j = [ti ■ pj'^\ {ti + 1) • p~'^') be a longest canonical 
interval (there could be more than one for i > 2) contained in the projection of the empty box B onto the 
axis Xi. Then the side length of B along Xi must be less than 2 • p^''^^^ because otherwise the projection 
would contain a longer canonical interval of length p^"^'^^- 

Next consider the axis xq. Let k = J2j>o ^ijPi base-pj representation of an integer k, < k < 

n — 1 and 1 < i < d — 1. In the Halton-Hammersely construction, a point in Hn with ZQ-coordinate k/n 
has its -coordinate in the canonical interval /j if and only if ti • p^'^^ < J2j>o ^ijPi''^^ ^ + ^) " Pi'^'' 

which happens exactly when Ylf=o ^i,jP7''~^ — ^i ' Pi'^'^- "^^is case, k modpf = Ylf=o ^ijPi ^ 
constant Zi = Zi(ti, qi). 

Note that the d—1 integers pf , \ < i < d — 1, we relatively prime. By the Chinese remainder theorem, 
it follows that a point in Hn with XQ-coordinate k/n has its Xj-coordinate in the canonical interval li for 
all 1 < i < d — 1 if and only if k mod Hjt^i Pf — some integer z = z{ti, qi; . . .; td-i, qd-i)- 

Therefore the side length of B along xq is at most {Y[i=i pf)/^- Consequently, the volume of B is less 

than (nti '^■pi'-'-') ■ (nti )/^ = (2^^"^ nti P.)/"- □ 



4 



Corollary 2. Ad[n) < {2'^-' Utl Pi) ' ^■ 

It is known that {Yli=iPi)/x'' 1 as x ^ oo EOl. 

3 A (1 — e) -approximation algorithm for finding the largest empty box 

Let R be an axis-parallel d-dimensional box in containing n points. In this section, we present an efficient 
(1 — e) -approximation algorithm for computing a maximum- volume empty axis-parallel box contained in R. 

Theorem 4. Given an axis-parallel d-dimensional box R in containing n points, there is a (1 — e)- 
approximation algorithm, running in 

time, for computing a maximum-volume empty axis-parallel box contained in R. 
We first set a few parameters. 

Parameters. We assume that < e < 1, and d > 3, which cover all cases of interest. To somewhat 

us choose parameters 

, and a = . (2) 

1 — 



simplify our calculations we also assume that 


n>12 


Let 




"1" 




'2d' 


6 




e 


Let k be the unique positive integer such that 










-1 


< n 


+ 1 



(3) 

We next derive some inequalities that follow from this setting. By assumptions < e < 1 and d > 3, 
we have 5 = ^ < |, and m > 2d/e > 2(i > 6. Then a simple calculation shows that 

' <1 + ?5 = 1 + S. (4) 



1-6 - 5 bd 
It is also clear that a = > 1 + 5. So a satisfies 

1 6 6 

l<l + <5<a = - ^<l + -<5<-. (5) 

1 — 5 5 

Since n > 12 and a < |, it follows from the second inequality in (3) that k > 15. We now derive an upper 
bound on /c as a function of n, d and e. First observe that 

loga = log Y^-^ > log(l + (5). 

We also have 

ln(l + 5) > 0.95 for J < -. 

6 

From Q we deduce the following sequence of inequalities: 

, ^ ^ log(n + 1) ^ log(n + 1) log(n + l)-ln2 ^ log(n + l)-ln2 ^ 0.781og(n+l) 

fc — 1 < < = < < . (D) 

- loga -log(l + (5) ln(l + 5) " 0.95 " 6 

From a straightforward calculation (where we use n > 12 and 6 < 1/6) gives 

0.781og(n + l) 0.781og(n + l) + l/6 logn 2d 

k < 1 h 1 < = < — ^ = — - log n. (7) 

Ode 
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Overview of the algorithm. By a direct generalization of Observation [T] we can assume w.l.o.g. that 
R= [0, l]'^. Let S be the set of n points contained in R. The algorithm generates a finite set B of canonical 
boxes; to be precise, only a subset of large canonical boxes. For each large canonical box Bq ^ B, Si 
corresponding canonical grid is considered, and Bq is placed with its lowest corner at each such grid position 
and tested for emptiness and containment in R. The one with the largest volume amongst these is returned 
in the end. 



Canonical boxes and their associated grids. Consider the set B of canonical boxes, whose all side 
lengths are elements of 

'^' = {^,^ = 0,l,...,A;-l}. (8) 

For a given canonical box Bq G B, with sides Xi , . . . , G X, consider the canonical grid associated with 
Bq with points of coordinates 

, ii,...,td>0 (9) 

m my 

contained in Ud- 

Let i? be a maximum-volume empty box in R = Ud, with Vmax = yol{B). By the trivial inequality 
Ad{n) > of Proposition |2l we have Vmax > This lower bound is crucial in the design of 

our approximation algorithm, as it enables us to bound from above the number of large canonical boxes 
(canonical boxes of smaller volume can be safely ignored). 

Consider the following set T of /c + 1 intervals 



0,1,..., k\. (10) 



Observe that for each i = 1,. . . ,d, the extent in the ith coordinate of B is at least = since 
otherwise we would have vol{B) < ^ < a contradiction. Let Zi be the extent in the ith coordinate 
of B, for i = I, . . . ,d. By the above observation, for each i = I, . . . ,d, Zi belongs to one of the last k 
intervals in the set Z. That is, there exists an integer G {0, 1, . . . , /c — 1}, such that 



Zi £ 



(11) 



The next lemma shows that B contains an (empty) canonical box with side lengths 

Xi = ^FTT' i = '^,---,d, (12) 

at some position in the canonical grid associated with it. We call such a canonical box contained in a 
maximum-volume empty box, a large canonical box. Two key properties of large canonical boxes are 
proved in Lemma |4] and Lemma [5] 

Lemma 3. If for each i = 1, . . . ,d, the extent in the ith coordinate of B belongs to the interval as in (|lll) . 
then B contains an (empty) large canonical box Bq with side lengths as in (1121) at some position in the 
canonical grid associated with it. 

Proof. It is enough to prove the containment for each coordinate axis i. Let I and Iq be the corresponding 
intervals of B and Bq, respectively. Assume for contradiction that the placement of Iq with its left end point 
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at the first canonical grid position in I is not contained in /. But then we have, by taking into account the 
grid cell lengths: 

|j„| „yi 

^<i/i<i/„i + i^<i/„i + *.|/„i = (i+*)^, 

and consequently, 

a < 1 + 5. 

We reached a contradiction to the 2nd inequality in ([S]), and the proof is complete. □ 

We now show that the (empty) large canonical box Bq C B from the previous lemma yields a (1 — e)- 
approximation for the empty box B of maximum volume. 

Lemma 4. 

vol(5o) > (1 -e) • vol(S). 

Proof. By Clll and (dU, 
It remains to be shown that 



i=l 1 = 1 



1 

But this follows from our choice of a and from Bernoulli's inequality: 

(1 + xy > 1 + qx, for any x > —1, and any positive integer q. 

Indeed, 

1 / 6X2^ £ 

^=1 > l-2d - — = l-e, 

a?'^ V 2dJ - 2d 

and the proof of Lemma m is complete. □ 

Observe that the number of canonical boxes in B is exactly k'^, and by ^ is bounded from above as 
follows: ^ 

k"^ < •log'^n. (13) 

We can prove however a better upper bound on the number of large canonical boxes. 
Lemma 5. The number of large canonical boxes in B is at most 

— j • log n. 
Proof. Recall that voI(i?) satisfies 

i=l 

for some integers yi £ {0,1, k — 1}. It follows immediately that 

d 

dk-k-d<^yi<dk-d, (14) 

i=l 
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and we want an upper bound on the number of solutions of (fT4b . Make the substitution Zi = k — I — yi, for 
i = 1, 2, . . . , d. So Zj G {0, 1, . . . , A; — 1}, for i = 1, 2, . . . , d. The above inequalities are equivalent to 



0<'^zi<k. (15) 

i=l 

Let t be a nonnegative integer. It is well-known (see for instance fill) that the number of nonnegative 
integer solutions of the equation Yli=i = t equals that is, when we ignore the upper bound on 

each Zi. Summing over all values of t G {0, 1, ... , k}, and using a well-known binomial identity (see for 
instance |[2T1 p. 217]) yields that the number of solutions of (ITSl) . hence also of (fT4l ). is no more than 

^ft + d-l\_fk + d-l + l\_fk + d 
d-l )- \ d-l + l )~ \ d 

A well-known upper bound approximation for binomial coefficients 

ri\ / en \ 

l) ^ (t) ■ 

for positive integers n and k with 1 < A; < n, further yields that 
We now check that 

k + d<^^. 



Recall inequality (O. A straightforward calculation (where we use n > 12, d > 3, and e < 1), gives 

k + 4< "^^'^'-f + " + d + 1 ^ t " < !^ = . logn. ,17) 

doe 

as claimed. Substituting this upper bound into ( fT6] ) yields 

^ j < e'^ ( — •lognj =(— J -log^n, (18) 

as required. This expression is an upper bound on the number of solutions of ([T4l ). hence also on the number 
of large canonical boxes in ^. □ 

Given a grid with cell lengths xi,X2, ■ ■ ■ , Xd, we superimpose it so that the origin of Ud is a grid point of 
the above grid. Denote the corresponding grid cells by index tuples {ii,i2, ■ ■ ■ ,id), where ii,i2, ■ ■ ■ ,id^ ^■ 
Note that some of the grid cells on the boundary of Ud may be smaller. Given a grid G superimposed on Ud, 
let M{G) be the number of cells (with nonempty interior) into which Ud is partitioned. 

Consider a (fixed) canonical box, say Bq, with side lengths as in (fT2l) . The associated canonical grid, 
say Go, has side lengths m times smaller in each coordinate. We now derive an upper bound on the number 
of canonical grid positions where a canonical box is placed and tested for emptiness, according to 



Lemma 6. The number of canonical grid positions for placing Bq in Gq is bounded as follows: 

M{Go) < 12 • (—] ■ n. 
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Proof. We have 

d r it* 4-1 n d / 

m ■ _ T r / m ■ 



i=l 

Observe that 



i=l ^ 



aVi ~ ~ ay^ 

By substituting this bound in the product we get that 



kd+d 



j=l i=l 



^kd+d 

a'' 



<(- + l)'-ir^ = (- + l)'-«''-«'- (19) 



For the last inequality above we used (fT4l ). We now bound from above each of the three factors in ( [T9l ). For 
bounding the second and the third factors we use inequalities (01) and respectively. 



(m + 1)' 



2d 

e 



2d Y f'i.dY / e\d {2dY . f2dY 



<,^+. <^ .e><e^ , 



6 13n 

a^^ = a • a''-^ < a(n + 1) < - • (n + 1) < , for n > 12. 

5 10 

Substituting these upper bounds in ([T9l ) gives the desired bound: 

11/^ /2d\'^ 13n /2d\'^ 

Testing canonical boxes for emptiness. Given a grid with cell lengths xi,X2, ■ ■ ■ ,Xd, denote the corre- 
sponding grid cell counts or cell numbers (i.e., the number of points) in cell (zi, 22, . . . , id) by n(ii, 12, . . . , id). 
For simplicity, we can assume w.l.o.g. that in all the grids that are generated by the algorithm, no point of 
S lies on a grid cell boundary. Indeed the points of 5* on the boundary of R = Ud can be safely ignored, 
and the above condition holds with probability 1 if instead of the given e, the algorithm uses a value chosen 
uniformly at random from the interval [(1 — ^)e, e]', see also the setting of the parameters in (O. Given a 
grid G, and dimensions (array sizes) Mi, . . . , Md > 1, a floating box at some position aligned with it, that 
is, whose lower left corner is a grid point, and with the specified dimensions is called a grid box. All the 
canonical boxes generated by our algorithm are in fact grid boxes. 

The next four lemmas ([Vl El [TOl ) outline the method we use for efficiently computing the number 
of points in 5 in a rectangular box, over a sequence of boxes. In particular these boxes can be tested for 
emptiness within the same specified time. 

Lemma 7. Let G be a grid with cell lengths xi,X2, ■ ■ ■ ,Xd, superimposed on Ud, with M{G) cells. Then 
the number of points of S lying in each cell, over all cells, can be computed in 0{d ■ n + M{G)) time. 

Proof. The number of points in each cell of M{G)) is initialized to 0. For each point p £ S, its cell index 
tuple (label) is computed in 0(d) time using the floor function for each coordinate, and the corresponding 
cell count is updated. □ 
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Remark. If the floor function is not an option, the number of points in each cell can be computed using 
binary search for each coordinate. The resulting time complexity is 0(n • log M{G)). 

Denote by N{ii,i2, ■ ■ ■ ,id) the number of points in S in the box with lower left cell (0, 0, ... , 0), and 
upper right cell {ii,i2, ■ ■ ■ ,id)', we refer to the numbers N{ii,i2, . . . , i^) as comer box numbers. 

Lemma 8. Given a grid G with cell lengths xi,X2, ■ ■ ■ ,Xd placed at the origin, with M{G) cells, and grid 
cell counts n{ii,i2, . . . , id), over all cells, the corner box numbers N(ii, ^2, • • • , id)> over all cells, can be 
computed in 0(2'^ ■ M{G)) time. 

Proof. Define N{ii,i2, ... ,id) = 0, if ij < for some j. Let b = {bi,b2, ■ ■ ■ ,bd) ^ {0, l}*^ be a binary 
vector. Let the parity of b be 7r(&) = 6i © 62 © • • • © ^d- By the inclusion-exclusion principle, the corner 
box numbers are given by the following formula with at most 2'^ operations: 

N{ii,i2, ...,id)= n{iui2, ...,id)+ Yl - ^2 - 62, . . . , id - bd). 

b={bi,b2,-,bd) 
6^(0,0,. ..,0) 

Since G has M{G) cells, the bound follows. □ 

Lemma 9. Given is a grid G with cell lengths xi,X2, . . . ,Xd placed at the origin, with M{G) cells, and 
corner box numbers N{i\,i'2, . . . over all cells. Let Bq be a (canonical) grid box with dimensions 
(array sizes) Mi, . . . , Md > 1, and lower left cell {ii,i2, . . . , id)- Then the number of points of S in Bq, 
denoted N{Bq), can be computed in 0{2'^) time. 

Proof. Let ji = ii + Mi — l,...,jd = id + Md — 1 be the upper right cell of Bq. By the inclusion-exclusion 
principle, the comer box number N{ji ,j2,---,jd) can be computed as follows: 

iV(ji, J2, ...,jd) = N{Bo) + Yl - ^iMi, J2 - 62M2, ...,jd- bdMd). 

b={bi,b2,...,bd) 
6^(0,0,. ..,0) 

Hence N{Bo) can be extracted from the above formula with at most 2^* operations. □ 

Let Q{ii,i2, ■ ■ ■ lid) be the number of points in S in the canonical box of dimensions (array sizes) 
Ml, . . . , Md > 1, and lower left cell {ii, ^2, • • • , id)- 

Lemma 10. Given is a grid G with cell lengths xi,X2, ■ ■ ■ ,Xd placed at the origin, with M{G) cells, and 
comer box numbers N(i'i,i2, . . . , i'j), over all cells. Then the numbers (counts) Q{ii,i2, • • • , id)' over all 
cells, can be computed in 0(2'^ ■ M{G)) time. 

Proof. There are M{G) cells determined by G in Ud, and for each, apply the bound of Lemma|9] □ 

The last step in the proof of Theorem U For each canonical box, say Bq, there is a unique associated 
canonical grid, say Gq. The time taken to test Bq for emptiness and containment in R when placed at all 
grid positions in Go, is obtained by adding the running times in lemmas |7J[8l andflOl 

0(d-n + M(Go) + 2'^-M(Go)) =0 ^2^^- ■ = O (j^^^ • , (20) 

where we have used the upper bound on M(Go) in Lemma[6] By multiplying this with the upper bound on 
the number of large canonical boxes in Lemma |5] we get that the total running time of the approximation 
algorithm is 

o(^(|y.log'^n.(^)'.n) =0( (^) •n-log'^n). (21) 
The proof of Theorem |4] is now complete. 
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A Proof of Lemma |2] 



To see that A{4) < j, consider the 4 points (|, j), |), (|, and check that the largest empty 

rectangle has area |. Next we prove the lower bound. Let 5 = {^1,^25^31^4} be a set of 4 points, and 
assume without loss of generality that they are in lexicographic order: x{pi) < x{p2) < x{ps) < x{p4), 
and if x{pi) = x{pj) for i < j, then y{pi) < y{pj)- We can also assume that y{pi) < y{p2)- Encode each 
possible such 4-point configuration by a permutation vr of 1, 2, 3, 4 as follows: for i < j, 7r(i) < 7r(j) if and 
only if y{pi) < y{pj)- For example vr = (2, 4, 3, 1) encodes the configuration shown in Fig. [fright). 




V 



Figure 1 : Left: tt — (2, 4, 1, 3) is special. Right: tt = (2, 4, 3, 1) is non-special; s is the right side of U, v is the lower 
left corner of U, and s' is the top side of U. 

By our assumption y{pi) < y{p2), there are only 12 permutations (types) out of the total of 4! = 24 to 
consider, those with 7r(l) < vr(2). Two of these permutations, namely (2, 4, 1, 3) and (3, 4, 1, 2), are called 
special: the 4 points are in convex position and there is an empty rectangle R C U, with one of these points 
on each side of R. All the remaining 10 permutations are called non-special. We distinguish two cases: 

Case 1: S is encoded by a special permutation. For each of the four sides s of U, let P{s) be the largest 
empty rectangle containing s. See Fig.[Hleft) for an example. We can assume that the area of each rectangle 
P{s) is smaller than i, since else we are done. But then it follows that each of the four sides of R is longer 
than 1 — I = i, so the area of R is larger than ^ • | = |, so this case is settled. 

Case 2: S is encoded by a non-special permutation. For each of the four vertices v of U, let Q{v) be the 
largest empty rectangle having u as a vertex. A routine verification shows that for each of the 10 non-special 
permutations there is a side sofU and a vertex v ofU such that (i) P{s) and Q{v) have a common boundary 
segment, and (ii) v is an endpoint of the side opposite to s. More precisely, if vr is one of six permutations 
(1,2,3,4), (1,2,4,3), (1,3,2,4), (1,3,4,2), (1,4,2,3), (1,4,3,2), then s is the left side, and v is the 
lower-right comer; if vr is one of four permutations (2, 3, 1, 4), (2, 3, 4, 1), (2, 4, 3, 1), (3, 4, 2, 1), then s is 
the right side, and v is the lower-left corner. See Fig.lUright) for an example. 

As in Case 1, we can assume that the area of P{s) is smaller than i, thus its shorter side is smaller than 
|. By the same token, one of the sides of Q{v) is longer than I — \ = |, hence the other side must be 
shorter than i, since otherwise the area of Q{v) would exceed |. Let s' be the side of U adjacent to s and 
disjoint from v. Consequently, the rectangle R' with side s' and adjacent to Q{v) has the other side longer 
than 1 — i = |. Observe that R' has at most two points in its interior. By Lemma [T] and Observation [T] R' 
contains an empty rectangle of area at least 

?.j = i^ = 0,254.,,>l. 

3 3 4' 

as required. This concludes the analysis of the second case. 

Thus in both cases, there is an empty rectangle of area at least j. □ 
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B Empty squares and hypercubes 



Define A'^{n) as the volume of the largest empty axis-parallel hypercube (over all n-element point sets in 
in [0, l]'^), analogous to Ad{n) for the largest empty axis-parallel box. For simplicity we sometimes omit 
the subscript d in the planar case (d = 2). That is, A'{n) denotes the area of the largest empty axis-parallel 
square. Then for any fixed dimension d, our next theorem shows that A'^{n) = Q [yj, too: 

Theorem 5. For a fixed d, A'^{n) = Q (i). More precisely, 

^ < A'^in) < , ... . (22) 



(nVrf + l)'^ - <i^^ - (L^i/rfJ +l)d 
Proof. We will prove the bounds for the planar case d = 2: 

^ < A'(n\ < ^ 

(V^+l)2 - ^ > - ([^J +1)2- 

The proof can be easily generalized for d > 3. 

We first prove the lower bound. Let 5 be a set of n points in the unit square U. Let x be a positive 
number to be determined. Let X be an axis-parallel square of side 1 — x that is concentric with U. For each 
point p G S*, place an axis-parallel (open) square of side x centered at p. If there is a point q £ X that is 
not covered by the union of the n squares, then the axis-parallel square of side x centered at q is an empty 
square contained in U. 

The area of X is (1 — x)^. The total area of n squares of side x is nx"^. Let x be the solution to the 
following equation 

(1 — x)"^ = nx"^ . 

The solution is x = . For this value of x, either the n small squares cover X with no interior overlap 
among themselves, or there is interior overlap and they don't cover X. In either case, there exists an open 
axis-parallel square of side length x, centered at a point in X, and empty of points in S. Consequently, 

A!(n) > x^ ^ 



We next prove the upper bound. Let k = [\/nJ- Note that n > k"^. Partition the unit square U into a 
(A; + 1) X (fc + 1) square grid of cell length l/{k + 1). Place a point at each of the A;^ grid vertices in the 
interior of U. Then any axis-parallel square contained in U whose side is longer than l/(fc + 1), must be 
non-empty. Consequently, 

A'in) < ^ 



It remains to show that (l22l) implies that for a fixed d, we have A^(re) = (^) . The following inequal- 
ities are straightforward: 



Putting them together yields 

1 1 , 1 I 

as claimed. □ 
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C A (1 — e) -approximation algorithm for finding the largest empty 
hypercube 

Let R be an axis-parallel d-dimensional hypercube in M*^ containing n points. In this section, we present an 
efficient (1 — e) -approximation algorithm for computing a maximum- volume empty axis-parallel hypercube 
contained in R. 

Theorem 6. Given an axis-parallel d-dimensional hypercube R in containing n points, there is a {\ — e)- 
approximation algorithm, running in 

CI — ■ n log n + I — 1 ■ n ' log n I 

time, for computing a maximum-volume empty axis-parallel hypercube contained in R. 

Proof. The overall structure of the algorithm is similar to that for finding the largest empty box. We can 
assume w.l.o.g. that R = Ud = [0, l]'^, n > 12, and d > 3. Recall that, by Theorem |5l the volume of a 
largest empty hypercube in 17^ is at least (n^/*^ + l)"''. We set the parameters 6, m and a as in equation ([2]). 
Inequalities (HJl and dS]) also follow. Let now k be the unique positive integer such that 

^fc-i < ^i/d + 1 < flfc. (23) 

Thus 

Since n > 12 and d > 3 we have 

^ 1 + 5 log ^ ^ (g + 5) log ^ ^ 2 logn ^ 21ogn In 2 ^ 31ogn 
~ logo ~ log a ~ 3 loga ~ 3 0.9(^ ~ 55 

It follows that 

31ogn logn 2d , 
< 1 + — ^ < = — -logn. (24) 

5(3 £ 

Consider the set H of k canonical hypercubes whose sides are elements of X (as in (H)): 

'^' = |^,^ = 0,l,...,A;-l|. (25) 

For a given canonical hypercube Hq G H, with side X £ X, consider the canonical grid associated 
with Hq with points of coordinates 



kX idX 

m ' ' m 

contained in Ud- 

Consider the set T of A; + 1 intervals (as in (ITOl)): 



ii,---,id>0 (26) 



0,1,..., k}. (27) 



Let be a maximum-volume empty hypercube in i? = Ud, with side length Z and T^nax = vol(ff). 
Observe that Z > indeed, Z < would imply that 

z''<4.< ^ 



a 
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in contradiction to the lower bound in Theorem [5] This means that Z belongs to one of the last k intervals 
in the set X. That is, there exists an integer y G {0, 1, . . . , — 1}, such that 

Analogous to Lemma[3l we conclude that R contains a large canonical hypercube, say Hq, whose side 



IS 



at some position in the canonical grid associated with it. Analogous to Lemma IH we show that voI{Hq) > 
(1 - e) • Yol{Hy. By ^ and (EUl, 

-l(^o)=(^J =^-(^) >^-vol(F)>(l-.).voW, 

since the setting of a is the same as before. Analogous to Lemma [51 now (l24l ) is the upper bound we need 
on the number of canonical hypercubes. The bound in Lemma[6]needs to be adjusted because k is chosen 
differently, and we have a different upper bound on the third factor in the product, a^. From the definition 
of k in (l23l) and from ([5]) we deduce 

a'' = a- a^-^ < a{n^l'^ + 1) < 2an^/'^ < — n^/"^. 

5 

The resulting bound analogous to that in Lemma[6]is now 

M(Go) < e"/^ ' • ^n^l" < 22 • " ■ n^'^. (30) 

The time taken to test Hq for emptiness and containment in R when placed at all relevant grid positions 
is now 

0{d-n + M{Go) + 2'^-M{Go)) = o(^dn + 2'^- (^^^ ■ n^/'^^ = O (^dn + (^^^ ■ n^^'^^ . 

By multiplying this with the upper bound in (l24l ). on the number of canonical hypercubes, we get that 
the total running time of the approximation algorithm is 

CI — ■ n log n + I — 1 ■ n ' log n 1 . 

The proof of Theorem[6]is now complete. □ 



D An asymptotically tight bound on the number of restricted boxes 

In this section we prove the following theorem: 

Theorem 7. Let Ud be the unit hypercube [0, l]'^. For any n > 0, there exist n points in Ud such that the 
number of restricted boxes in Ud is at least ([^J + 1)"^. On the other hand, the number of restricted boxes 
determined by any set of n points in Ud is at most (^) • 
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We prove the lower bound in Theorem|7]by construction. We will use the following lemma: 

Lemma 11. Let n = X^f^j^ Ui, where rii > 2, 1 < i < d. Then there exist n points in such that the 
number of maximal empty axis-parallel boxes in is at least W^^i{ni — 1). 

Proof. Let ±xi, . . . , itx^ be the positive and negative unit vectors along the d axes of W^. Partition these 
2d vectors into d groups of orthogonal vectors, 

-X2}, {+X2, -X3}, . . . , { + -fd}, { + Xd, -xi}, 

with one positive vector and one negative vector in each group. Then, for each group of two orthogonal 
vectors, say —Xj}, place a sequence of rij points in as 

kxi — {ui + 1 — k)xj, 1 < k < rii, 

where each pair of consecutive points in the sequence, say 

(a — l)xi — bxj and axi — (b — l)xj, 

corresponds to a pair of open half-spaces 

Xi < a and Xj > —b. 

Consider the pair of open half-spaces Xi < a and xj > —b corresponding to the pair of consecutive 
points in the sequence for the group {+Xi, —xj}. Since the points in the sequence have monotonic Xi 
and Xj coordinates, we have property (i) that the intersection of the two half-spaces contains no points in 
the sequence, and property (ii) that each of the two points is on the boundary of one half-space and is in 
the interior of the other half-space. Moreover, since the Xi and xj coordinates of the points in the other 
sequences are either zero or different in sign from the points in this sequence, we have (iii) that each of the 
two half-spaces contains all points in the other sequences. There are Y[i=i ('^i ~ 1) combinations of d pairs of 
consecutive points, one pair from each sequence. Consider the intersection of the d pairs of half-spaces 
corresponding to any of these combinations. By (i), the intersection Rfi must be empty. By (ii) and (iii), 
there is a point in the interior of each bounding face, thus the intersection box must be maximal. Hence 
for each combination, the intersection of the corresponding d pairs of half-spaces is a unique maximal empty 
axis-parallel box. 
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Figure 2: An example of the construction. 
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We refer to Fig. |2] for an example of the planar case. For ni = 3, n2 = 4, and n = 7, the four unit 
vectors itx and ±y are grouped into —y} and {+y, —x}. The corresponding two sequences of points 
have the following (x, y) -coordinates: 

(1,-3) (2,-2) (3,-1) 
(-4,1) (-3,2) (-2,3) (-1,4). 

Then the following two pairs of consecutive points 

(1,-3) (2,-2) 
(-3,2) (-2,3) 

correspond to the following two pairs of half-planes: 

X < 2 and y > — 3 
y < 3 and x > — 3 

whose intersection is the maximal empty box (—3, 2) x (—3, 3). □ 

By scaling and translation, the n points in Lemma [TT] can be placed in the unit hypercube Ud = [0, l]'^ 
such that the number of restricted boxes inside Ud is at least Iliid^J + 1) = (L^J + 1)"^' where the 
change from —1 to +1 in the product accounts for the two bounding faces of the unit hypercube perpen- 
dicular to each axis. This proves the lower bound. The same lower bound was obtained independently and 
simultaneously by Backer and Keil HISllll. 

To prove the upper bound in Theorem|7l we borrow the deflation-inflation idea of Backer and Keil f5^,'6l|. 
Assume for simplicity that the points have distinct coordinates along each axis (it is possible to perturb the 
points symbolically so this condition holds). Let B be an arbitrary restricted box. Consider the 2d faces of 
the box in any fixed order. If a face contains a point in its interior, deflate the box by pushing the face toward 
its opposite face until it contains a point on its boundary. After d such deflations, we obtain an empty box 
B' C B that is the smallest box containing exactly d points on its boundary. To recover the original box 
B from B', it suffices to inflate the box at the d faces in reverse order, by pushing each face away from its 
opposite face until it contains a point in its interior. Therefore the number of restricted boxes B is at most 
the number of deflated boxes B' times the number of subsets of d deflated faces, that is, (^) • (^^) . Since 
id) < n'^/dl and {^^) = {2d)\/{dl f, we have 

n\ {2d\ <^d(2d)! 



^dj \d J - {d\ 
By Stirling's formula, dl = V2^{d/e)'^{l + 0{l/d)), hence 



{2dy. _ V2TT2d{2d/e] 



2d 



(l±0(l 



{d\f {V2^{d/eYY V27rd 
Thus 

Our upper bound is sharper (with respect to the dependence on d) than the upper bound of 0{n'^) ■ 2^'^ 
by Backer and Keil ||5l|6l. The ratio of our upper bound to the lower bound is 



17 



In comparison, the ratio of their upper bound to the same lower bound is 
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